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Q Abstract 

Zipf 's law is shown to arise as the variational solution of a problem formulated in Fisher's terms. An appropriate minimization 
process involving Fisher information and scale-invariance yields this universal rank distribution. As an example we show that the 
^^number of citations found in the most referenced physics journals follows this law. 
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I— I'l. Introduction 

This work discusses the application of Fisher's information 
Q measure to some scale-invariant phenomena. We thus begin our 
O considerations with a brief review of the pertinent ingredients. 

^ 1.1. Scale-invariant phenomena 
• ^ The study of scale-invariant phenomena has unravelled in- 
j^teresting and somewhat unexpected behaviours in systems be- 
i~| longing to disciplines of different nature, from physical and 
Qnbiological to technological and social sciences [1]. Indeed, 
empirical data from percolation theory and nuclear multifrag- 
^— I mentation |2] reflect scale-invariant behaviour, and so do the 
J> abundances of genes in various organisms and tissues [,3], the 
frequency of words in natural languages MT, scientific collab- 
bration networks ||5l], the Internet traffic la], Linux packages 
links i Rl, a s well as electoral results jit], urban agglomera- 
tions ||9l [lO[] and firm sizes all over the world yj | . 

The common feature in these systems is the lack of a char- 
acteristic size, length or frequency for an observable k at study. 
This lack generally leads to a power law distribution p(k), valid 
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in most of the domain of definition of k. 
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p{k) ~ Hk 



(1) 



with y > 



0. Special attention has been paid to the class of 
universality defined by y = 1, which corresponds to Zipf's 
law in the cumulative distribution or the rank-size distribu- 
tion ilSlllllTillliKIiliil. Recently, Maillart et al. OB 
have studied the evolution of the number of links to open 
source software projects in Linux packages, and have found 
that the link distribution follows Zipf's law as a consequence 
of stochastic proportional growth. In its simplest formulation. 
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the stochastic proportional growth model, or namely the geo- 
metric Brownian motion, assumes the growth of an element of 
the system to be proportional to its size k, and to be governed 
by a stochastic Wiener process. The class y - I emerges from 
the condition of stationarity, i.e., when the system reaches a 
dynamic equilibrium 11211 . Together with geometric Brownian 
motion, there is a variety of models arising in different fields 
that yield Zipf's law and other power laws on a case-by-case 
basis 10, [13, [H H Hi, as preferential attachment 16] and 
competitive cluster growth llisll in complex networks, used to 
explain many of the scale-free properties of social, technologi- 
cal and biological networks. 

1.2. Fisher's information measure 

Much effort has recently been devoted to Fisher's informa- 
tion measure (FIM), usually denoted as /. The work of Frieden 



and co-workers 1 1^ llTl [18 



_„_„_^19L20,_ 21, 22, 23, 24, 25], Sil- 
ver Il26ll . and Plastino et al. among many others, 
has shed much light upon the manifold physical applications 
of /. As a small sample we mention that Frieden and Soffer 
have shown that FIM provides a powerful variational principle, 
called EPI (extreme physical information) that yields the canon- 
ical Lagrangians of theoretical physics 124]. Additionally, / has 
been proved to characterize an arrow of time with reference to 
the celebrated Fokker-Planck equation 1^28^. Moreover, there 
exist interesting relations that connect FIM and the relative 
Shannon information measure invented by Kullback fsi, 32]. 
These can be shown to have some bearing on the time evolu- 
tion of arbitrary systems governed by quite general continuity 
equations l29l l30tl . Additionally, a rather general /-based H 
theorem has recently been proved l33ll34ll . As for Hamiltonian 
systems llssll . EPI allows to describe the behaviour of complex 
systems, as the allometric or power laws found in biological 
sciences fs^]. The pertinent list could be extended quite a bit. 
/ is then an important quantity, involved in many aspects of the 
theoretical description of nature. 
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For our present purposes it is of the essence to mention 
that Frieden et al. |37] have also shown that equiUbrium and 
non-equilibrium thermodynamics can be derived from a princi- 
ple of minimum Fisher information, with suitable constraints 
(MFI). Here / is specialized to the particular but important 
case of translation families, i.e., distribution functions whose 
form does not change under translational transformations. In 
this case. Fisher measure becomes shift-invariant. It is shown 
in BTIl than such minimizing of Fisher's measure leads to a 
Schrodinger-like equation for the probability amplitude, where 
the ground state describes equilibrium physics and the excited 
states account for non-equilibrium situations. 

1.3. Goals and motivation 

Scale-invariant phenomena are generally addressed by ap- 
peal to ad-hoc models (see the references citing in 1.1). In 
spite of the success of these models, the intrinsic complexity in- 
volved therein makes their study at a macroscopic level a rather 
difficult task. One sorely misses a general formulation of the 
thermodynamics of scale-invariant physics, which is not quite 
established yet. It is our goal here to show, in such a vein, that 
minimization of Fisher information provides a unifying frame- 
work that allows these phenomena to be understood as arising 
from an MFI variational principle, entirely analogous to how 
termodynamics is generated in [34]. 

2. Minimum Fisher Information approach (MFI) 

The Fisher information measure / for a system described 
by a set of coordinates q and physical parameters 6, has the 
formQ 



where F{q\6) is the density distribution in a configuration space 
(q) of volume Q conditioned by the physical parameters (0). 
The constants Cjj account for dimensionality, and take the form 
Cij = Cidij if qi and qj are uncorrected. The equilibrium state 
of the system minimizes / subject to prior conditions, like the 
normalization of F or any constraint on the mean value of an 
observable (A,) IstII . The MFI is then written as a variation 
problem of the form 



(3) 



where fij are appropriate Lagrange multipliers. 



2.1. One-dimensional system with discrete coordinate 

Because of the nature of the systems to be addressed we con- 
sider now a one-dimensional system with a physical param- 
eter and a discrete coordinate k = ki,k2, . . . ,ki, . . . where 
^1+1 - ki - Ak for a certain value of the interval Ak. This 
scenario arises, for instance, in the case of nuclear multifrag- 
mentation |f2l, the abundances of genes fl, the frequency of 
words 1^], scientific collaboration networks Jst], the Internet 



traffic f^, Linuxpackages links f?\, electoral results 101, urban 
agglomerations , firm sizes 1 1 1], etc. 

In the continuous limit {Ak — > dk), the Fisher information 
measure is cast as 



1(F) 
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(4) 



Instead of using translation invariance a la Frieden-Sofffer Il24l] . 
we will appeal to scaling invariance [38] so that we can antici- 
pate some new physics. All members of the family F{k/0) pos- 
sess identical shape — there are no characteristic size, length or 
frequency for the observable k — namely dkF(k/0) = dk'F(k') 
under the transformation k' - k/0. 

To deal with this new symmetry it is convenient to change 
to the new coordinate u = Ink and parameter = Infl. 
Why? Because then the scale invariance becomes again transla- 
tional invariance, and we are entitled to use one essential result 
of [34], namely, that MFI leads to a Schroedinger-like equa- 
tion. Note that the new coordinate u' - Ink' transforms as 
m' = M - 0. Defining f{u) - Fie") and taking into account the 
fact that the Jacobian of the transformation is \dx/du\ - e" and 
d/d0 = e ®5/50, the Fisher information measure acquires now 
the form 



1(F) = CkC 
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du e"f(u) 



d\nf(u) 



du 



(5) 



dqF(q\0)Y^Cij—\nF(q\0)—\nF(q\0), (2) C 



where ui = lnA:i, and the factor e"^® guaranties the invariance 
of the associated Cramer-Rao inequality as shown in BSll . 

For reasons that will become apparent below, we will apply 
the MFI without any constraint. This is tantamount to posing no 
bound to the physical "sizes" that characterize the system. The 
extremization of Fisher information with no constraints (yu, - 0) 
is written as 

din fill) 2^ 



du e"fiu) 



du 
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(6) 



Introducing /(m) — e'"^^iu), and varying with respect to and 
d'^/du as in ['37'] one is easily led to a (real) Schrodinger-like 
equation of the form 
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du^ 



+ 1 



^'iu) = 0. 



(7) 



Notice that the lack of normalization constraints implies zero 
eigenvalue, since the Lagrange multiplier associated with the 
normalization is the energy eigenvalue iIstIi . At this point we 
introduce boundary conditions to guaranty convergence of the 
Fisher measure ^ and thus compensate for the lack of con- 
straints in (|6]i. We impose hm„^oo*P(M) = and *P(mi) = V^, 
where is an dimensionless constant the meaning of which 
will become clear later The solution to O with these bound- 
ary conditions is *P(m) = -<i/]\fe^("-"0/2^ which leads to /(m) = 
^g-(2H-Hi) ^jjj jjjg density distribution 



Fik)dk = N^dk 



(8) 



with N = 1 for a density normalized to unity. This distribution 
is just the Zipf's law (universal class y - 1) of Refs. |0, S H 



IS just tne , 



1211. This result is remarkable: Zipf's law has been 



here derived from first principles. 
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3. Applications 

A common representation of empirical data is the so-called 
rank-plot or Zipf plot Ji, T^, 12], where the jth element of the 
system is represented by its size, length or frequency kj against 
its rank, sorted from the largest to the smallest one. This pro- 
cess just renders the inverse function of the ensuing cumulative 
distribution, normalized to the number of elements. We call r 
the rank that ranges from 1 to A^. Thus, the constant arising 
from the boundary conditions is the total number of elements 
considered in building up the distribution as will be illus- 
trated in the examples bellow. This rank-distribution takes the 
form 



ki 

k(r) = A^— 
r 



(9) 



which yields a straight line in a logarithmic representation with 
slope -1. 



In Fig. la we depict the known behavior 111211 of the rank 
size distribution for the top 100 largest cities of the United 
States [39], which shows a slope near -1 (y = 1) in the log- 
arithmic representation of the rank-plot. 

We have also studied the system formed by the most refer- 
enced physics journals |40], using their total number of cites as 
coordinate k. If a journal receives more cites due to its popular- 
ity, it becomes even more popular and, therefore, receives still 
more cites, etc. Under such conditions, proportional growth and 
scale invariance are expected, as we depict in Fig. lb, where the 
slope's value can be regarded as illustrating the universaUty of 
the underlying law. 

4. Conclusions 




r (rank) 




r (rank) 

Figure 1: a. Rank-plot of the 100 largest cities of the United States, from 
most-populated to less-populated, in logarithm scale, b. Rank-plot of the total 
number of cites of the 30 most cited physics journals, from most-cited to less- 
cited, in logarithm scale. 



We have here shown that Zipf's law results from the scaling 
invariance of the Crammer-Rao inequality derived in [35]. This 
entails that the relevant probability distribution, usually called 
the rank-distribution, has to be size-invariant. Consequently, 
it should be derivable from a minimization process in which 
Fisher's information measure is the protagonist. No constraints 
are needed in the concomitant variational problem because, a 
priori, our sizes have no upper bound. A physical analogy is 
the non-normalizability of plane waves. The universal character 
of our demonstration thus resides in the universal form to be 
minimized (Fisher's), with no constraints. 
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